随着几个行业正在朝着建模大规模的3D虚拟世界迈进,因此需要根据3D内容的数量,质量和多样性来扩展的内容创建工具的需求变得显而易见。在我们的工作中,我们旨在训练Parterant 3D生成模型,以合成纹理网格,可以通过3D渲染引擎直接消耗,因此立即在下游应用中使用。 3D生成建模的先前工作要么缺少几何细节,因此在它们可以生成的网格拓扑中受到限制,通常不支持纹理,或者在合成过程中使用神经渲染器,这使得它们在常见的3D软件中使用。在这项工作中,我们介绍了GET3D,这是一种生成模型,该模型直接生成具有复杂拓扑,丰富几何细节和高保真纹理的显式纹理3D网格。我们在可区分的表面建模,可区分渲染以及2D生成对抗网络中桥接了最新成功,以从2D图像集合中训练我们的模型。 GET3D能够生成高质量的3D纹理网格,从汽车,椅子,动物,摩托车和人类角色到建筑物,对以前的方法进行了重大改进。
translated by 谷歌翻译
translated by 谷歌翻译
深度神经网络的3D语义分割的最新进展已取得了显着的成功,并且可用数据集的性能快速提高。但是,当前的3D语义分割基准仅包含少数类别 - 例如,扫描仪和semantickitti少于30个类别,这些类别不足以反映真实环境的多样性(例如,语义图像涵盖数百到数千个类别的类别)。因此,我们建议研究3D语义分割的较大词汇,并在扫描仪数据上具有新的扩展基准测试,其中有200个类别类别,比以前研究的数量级要多。大量的类别类别也引起了巨大的自然级别不平衡,这两者对于现有的3D语义分割方法都具有挑战性。为了在这种情况下了解更多强大的3D功能,我们提出了一种以语言为导向的预训练方法来鼓励学习的3D功能,该方法可能有限的培训示例以靠近其预训练的文本嵌入。广泛的实验表明,我们的方法始终优于我们所提出的基准测试( +9%相对MIOU)的3D语义分割的最先进的3D预训练,包括仅使用5%的 +25%相对MIOU的有限数据方案注释。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
我们呈现Mix3D,一种用于分割大规模3D场景的数据增强技术。由于场景上下文有助于推理对象语义,因此当前的工作侧重于具有大容量和接收字段的模型,可以完全捕获输入3D场景的全局上下文。然而,强烈的背景前瞻可能会有不利的影响,就像错过了一个穿过街道的行人。在这项工作中,我们专注于平衡全球场景和局部几何形状的重要性,以概括在培训集中的上下文前方之外的目标。特别是,我们提出了一种“混合”技术,通过组合两个增强的场景来创造新的训练样本。通过这样做,对象实例被隐式地放入新颖的外观环境中,因此模型更难地依赖场景上下文,而是从本地结构推断出语义。我们进行详细的分析以了解全球背景,局部结构,局部结构和混合场景效果的重要性。在实验中,我们展示了Mix3D培训的模型从室内(Scannet,S3DIS)和室外数据集(Semantickitti)上的显着性能提升。 Mix3D可以逐渐与任何现有方法一起使用,例如,用Mix3D培训,MinkowsWinet在SCANNet测试基准78.1 Miou的显着边际占据了所有现有最先进的方法。代码可用:https://nekrasov.dev/mix3d/
translated by 谷歌翻译
Spectral geometric methods have brought revolutionary changes to the field of geometry processing. Of particular interest is the study of the Laplacian spectrum as a compact, isometry and permutation-invariant representation of a shape. Some recent works show how the intrinsic geometry of a full shape can be recovered from its spectrum, but there are approaches that consider the more challenging problem of recovering the geometry from the spectral information of partial shapes. In this paper, we propose a possible way to fill this gap. We introduce a learning-based method to estimate the Laplacian spectrum of the union of partial non-rigid 3D shapes, without actually computing the 3D geometry of the union or any correspondence between those partial shapes. We do so by operating purely in the spectral domain and by defining the union operation between short sequences of eigenvalues. We show that the approximated union spectrum can be used as-is to reconstruct the complete geometry [MRC*19], perform region localization on a template [RTO*19] and retrieve shapes from a database, generalizing ShapeDNA [RWP06] to work with partialities. Working with eigenvalues allows us to deal with unknown correspondence, different sampling, and different discretizations (point clouds and meshes alike), making this operation especially robust and general. Our approach is data-driven and can generalize to isometric and non-isometric deformations of the surface, as long as these stay within the same semantic class (e.g., human bodies or horses), as well as to partiality artifacts not seen at training time.
translated by 谷歌翻译
Arguably one of the top success stories of deep learning is transfer learning. The finding that pre-training a network on a rich source set (e.g., ImageNet) can help boost performance once fine-tuned on a usually much smaller target set, has been instrumental to many applications in language and vision. Yet, very little is known about its usefulness in 3D point cloud understanding. We see this as an opportunity considering the effort required for annotating data in 3D. In this work, we aim at facilitating research on 3D representation learning. Different from previous works, we focus on high-level scene understanding tasks. To this end, we select a suite of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes. Our findings are extremely encouraging: using a unified triplet of architecture, source dataset, and contrastive loss for pre-training, we achieve improvement over recent best results in segmentation and detection across 6 different benchmarks for indoor and outdoor, real and synthetic datasets -demonstrating that the learned representation can generalize across domains. Furthermore, the improvement was similar to supervised pre-training, suggesting that future efforts should favor scaling data collection over more detailed annotation. We hope these findings will encourage more research on unsupervised pretext task design for 3D deep learning. Our code is publicly available at https://github.com/facebookresearch/PointContrast
translated by 谷歌翻译
Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -samples from 2D manifolds in 3D space -we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
translated by 谷歌翻译